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Abstract. We present a new algorithm to compute the classical modular 
polynomial "I"; in the rings Z[X,Y] and {Z/mZ)[X,Y], for a prime I and any 
positive integer m. Our approach uses the graph of Hsogenies to efficiently 
compute mod p for many primes p of a suitable form, and then applies the 
Chinese Remainder Theorem (CRT). Under the Generalized Riemann Hypoth- 
esis (GRH), we achieve an expected running time of 0{l^{logl)^ log log i), and 
compute <I>j mod m using 0{P{logl)^ + 1^ logm) space. We have used the new 
algorithm to compute <I>; with / over 5000, and <I>j mod m with I over 20000. 
We also consider several modular functions g for which is smaller than <&; , 
allowing us to handle I over 60000. 



1. Introduction 

For a prime I, the classical modular polynomial is the minimal polynomial of 
the function j{lz) over the field C(j), where j{z) is the modular j-function. The 
polynomial <I>; parametrizes elliptic curves E together with an isogeny E E' 
of degree I. From classical results, we know that lies in the ring Z[X, F] and 
satisfies = <^i{Y,X), with degree I + 1 in both variables [57, §69]. 

The fact that the moduli interpretation of <i>; remains valid modulo primes p ^ I 
was crucial to the improvements made by Atkin and Elkies to Schoof's point- 
counting algorithm [21, 49]. More recently, the polynomials $/ mod p have been 
used to compute Hilbert class polynomials [2, 53], and to determine the endomor- 
phism ring of an elliptic curve over a finite field [6]. Explicitly computing is 
notoriously difficult, primarily due to its large size. As shown in [19], the logarith- 
mic height of its largest coefficient is %l\o%l + 0{l), thus its total size is 

(1) Oil'logl). 

As this bound suggests, the size of grows quite rapidly; the binary representation 
of <i>7g already exceeds one megabyte, and $659 is larger than a gigabyte. 

The polynomial can be computed by comparing coefficients in the Fourier 
expansions of j{z) and j{lz), an approach considered by several authors [8, 21, 35, 
37, 39, 43, 45]. As detailed in [8], this only requires integer arithmetic and may 
be performed modulo p for any prime p > 21 + 2. The time to compute mod p 
is then 0{l^^^ (iogpY^^), and for a sufficiently large p this yields an 0(1'^^^) time 
algorithm to compute over Z. Alternatively (and preferably), one computes 
modulo several smaller primes and applies the Chinese Remainder Theorem, as 
suggested in [8, 35, 43, 45]. 

An alternative CRT-based approach appears in [16]. This algorithm uses iso- 
genics between supersingular elliptic curves defined over a finite field, and com- 
putes modp in time 0(Z''+^(logp)^+'' + (logp)**+''), under the GRH. Although 
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slower than using Fourier expansions, this approach relies solely on the fact that 
$j parametrizes isogenies. 

In [23], Enge uses interpolation and fast floating-point evaluations to compute 
€ Z[X, y] in time 0{l^ {log l)^'^'^), under reasonable heuristic assumptions. The 
complexity of this method is nearly optimal, quasi- linear in the size of $/. How- 
ever, most applications actually use $; in a finite field Fpn , and mod p may be 
much smaller than $j. In general, Enge's algorithm can compute and reduce 
it modulo p much faster than either of the methods above can compute mod p, 
but this may use an excessive amount of space. For large I this approach becomes 
impractical, even when mod p is reasonably small. 

Here we present a new method to compute <&; , either over the integers or modulo 
an arbitrary positive integer m, including m < I. Our algorithm is both asymp- 
totically and practically faster than alternative methods, and achieves essentially 
optimal space complexity. More precisely, we prove the following result. 

Theorem 1. Let I denote an odd prime and m a positive integer. Algorithm 6.1 
correctly computes $; e (Z/toZ)[X, y]. Under the GRH, it runs in expected time 

O(;3log3/loglog0, 

using 0(Plog/m) expected space. 

To compute over Z, the modulus m is made large enough to uniquely deter- 
mine the coefficients, via an explicit height bound proven in [12]. Our algorithm 
is of the Las Vegas type, a probabilistic algorithm whose output is unconditionally 
correct; the GRH is only used to analyze its running time. We have used it to 
compute for all I < 3600, and many larger / up to 5003. The largest previous 
computation of which we are aware has / equal to 1009. Working modulo m we can 
go further; we have computed $/ modulo a 256-bit integer m with I = 20011. 

Applications that rely on can often improve their running times by using 
alternative modular polynomials that have smaller coefficients. Our algorithm can 
be adapted to compute polynomials $f relating g{z) and g{lz), for modular func- 
tions g that share certain properties with j. This includes the cube root 72 of j, 
and we are then able to compute mod m more quickly by reconstructing it from 
'^'J^ mod m, capitalizing on a suggestion in [21]. Other examples include simple 
and double eta- quotients, the Atkin functions, and the Weber f-function. The last 
is especially attractive, since the modular polynomials for f are approximately 1728 
times smaller than those for j. This has allowed us to compute modular polynomials 
$j with I as large as 60013. 

The outline of this article is as follows. In Section 2 we give a rough overview of 
our new algorithm. The theory behind the algorithm is presented in Sections 3-5. 
We present the algorithm, prove its correctness and analyze its runtime in Section 6. 
Section 7 deals with modular polynomials for modular functions other than j, and 
a final Section 8 contains computational results. 

2. Overview 

Our basic strategy is a standard CRT approach: we compute mod p for various 
primes p and use the Chinese Remainder Theorem to recover e Z [X, Y] . Alterna- 
tively, the explicit CRT (mod m) allows us to directly compute <l>; G (Z/mZ)[X, y], 
via [4, Thm. 3.1]. By applying the algorithm of [53, §6], this can be accomplished 
in 0(Z^ log Im) space, even though the total size of all the mod p is 0{l^ log /). 
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Our method for computing $/ mod p is new, and applies only to certain primes p. 
Strategic prime selection has been used effectively in other CRT-based algorithms, 
such as [53], and it is especially helpful here. Working in the finite field Fp, we select 
I + 2 distinct values ji, compute ^i{X,ji) e Fj,[X] for each, and then interpolate 
the coefficients of e (rp[F])[X] as polynomials in Fp[F]. The key lies in our 
choice of p, which allows us to select particular interpolation points that greatly 
facilitate the computation. We are then able to compute mod p in expected time 

(2) 0{f{logp)'log\ogp). 

In contrast to the methods above, this is quasi-linear in the size of $/ mod p. 

Our algorithm exploits the structure of the Z-isogeny graph Gi defined on the set 
of j-invariants of elliptic curves over Fp. Each edge in this graph corresponds to 
an Z-isogeny; the edge (ji, ^2) is present if and only if ^2) — 0. As described 

in [28, 40], the ordinary components of this graph have a particular structure known 
as an I- volcano. Depicted in Figure 1 are a set of four Z-volcanoes, each with two 
levels: the surface (at the top), and the floor (on the bottom). Note that each vertex 
ji on the surface has I + 1 neighbors; these are the roots of ^i{X,ji) E Fp[X], and 
there are at least I + 2 such ji . 




FIGURE 1. A set of ^-volcanoes arising from Theorem 4.1. In 
this example I = 7 splits into ideals of order 3 in cl(0) and we 
have h{0) = 12 surface curves and h{R) = 72 floor curves. 



This configuration contains enough information to compute the 1 + 2 polynomials 
^i{X,ji) that we need to interpolate ^i{X,Y) mod p. It is not an arrangement 
that is likely to arise by chance; it is achieved by our choice of the order O and the 
primes p that we use. To further simply our task, we choose p so that vertices on 
the surface correspond to curves with Fp-rational Z-torsion. Our ability to obtain 
such primes is guaranteed by Theorems 4.1 and 4.4, proven in Section 4. 

The curves on the surface all have the same endomorphism ring type, isomorphic 
to an imaginary quadratic order O. Their j-invariants are precisely the roots of the 
Hilbert class polynomial Hq € Z[A"]. As described in [2], the roots of Ho may be 
enumerated via the action of the ideal class group cl(0). To do so efficiently, we 
use an algorithm of [53] to compute a polycyclic presentation for cl{0) that allows 
us to enumerate the roots of Hq via isogenics of low degree, typically much smaller 
than I. We may use this presentation to determine the action of any element of 
cl(0), including those that act via /-isogenics. This allows us to identify the l- 
isogeny cycles that form the surfaces of the volcanoes in Figure 1. 

Similarly, the vertices on the floor are the roots of H^, where R is the order of 
index I in O, and we use a polycyclic presentation of cl(i?) to enumerate them. To 
identity children of a common parent (siblings), we exploit the fact that siblings 
lie in a cycle of Z^-isogenies, which we identify using our presentation of cl(i?). It 
remains only to connect each parent to one of its children. This may be achieved 
by using Vein's formula [54] to compute an /-isogeny from the surface to the floor. 
By matching each parent to a group of siblings, we avoid the need to compute an 
Z-isogeny to every child, which is critical to obtaining the complexity bound in (2). 
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Below is a simplified version of tlie algoritlim to compute mod p. 
Algorithm 2.1. Given I, p, and O, compute <I>( modp as follows: 

1. Find a root of Ho over Fp. 

2. Enumerate the roots ji of Ho and identify the Z-isogeny cycles. 

3. For each ji find an Msogenous j on the floor. 

4. Enumerate the roots of H^ and identify the Z^-isogeny cycles. 

5. For each ji compute <Pi{X,j^) = Yl(j,,j^)eGi ~ Jk)- 

6. Interpolate S (Fp[y])[X] using the ji and the polynomials ^i{X,ji). 

Algorithm 2.1 assumes that I, p, and O satisfy the conditions of Theorem 4.1, 
and that h{0) > 1 + 2. We use the same O for each p, so Ho may be precornputcd. 
Note that we do not compute H^, we enumerate its roots by applying the Galois 
action of cl(i?) to a root obtained in Step 3. 

A more detailed version of Algorithm 2.1 appears in Section 6 together with 
Algorithm 6.1, which selects the order O and the primes p, and performs the CRT 
computations needed to determine over Z, or modulo m. 

3. Orders in imaginary quadratic fields 

It is a classical fact that the endomorphism ring of an ordinary elliptic curve 
over a finite field is isomorphic to an imaginary quadratic order O. The order O 
is necessarily contained in the maximal order Ok of its fraction field K, but we 
quite often have O C Ok- As most textbooks on algebraic number theory focus on 
maximal orders, we first develop some useful tools for working with non-maximal 
orders. To simplify the presentation, we work throughout with fields of discriminant 
dK < —4, ensuring that we always have the imit groups O* = = {±1}. We 
use (jy) to denote the Kronecker symbol, which is —1, 0, or 1 as the prime p splits, 
ramifies, or remains inert in K (respectively). 

Let O be a (not-necessarily maximal) order in a quadratic field K of discriminant 

(Ik < —4. Let N be a positive integer prime to the conductor u = [Ok '■ O]. The 
order R = Z + NO has index N in O, and its ideal class group cl(i?) is an extension 
of cl(0). More precisely, as in [52, Thm. 6.7], there is an exact sequence 

(3) 1 {0/NOy/{Z/NZ)* cl{R) cl{0) — ^ 1, 

where <^ maps the class [I] to the class [lO] . For i?- ideals prime to uN, the underly- 
ing map / I— lO preserves 'norms', that is, [R: I] = [O : lO], as in [20, Prop. 7.20]. 
We have a particular interest in the kernel of the map ip. 

Lemma 3.1. In the exact sequence above, if N = p" is a power of an unramified 
odd prime p, then ker ^p is cyclic of order (p — (^) ) . 

Proof. We compute the structure of ker(p = {0/p"0)* /(Z/p"Z)*. The group 
(Z/p"Z)* is cychc, isomorphic to the additive group (Z/(p — 1)Z) x (Z/p"~'^Z). 
We now apply [17, Cor. 4.2.11] to compute the structure of (0/p"0)*: 



(4) (0/p"0)* ^ 



(Z/(p - 1)Z)2 X (Z/p"-iZ)2, if p splits in K; 
(Z/(p2 _ i)Z) X (Z/p"-iZ)2, if p is inert in K. 



MODULAR POLYNOMIALS VIA ISOGENY VOLCANOES 



5 



In both cases, the factor {Ti/p^ ^Z) of (Z/p"Z)* is a maximal cyclic subgroup of 
the Sylow p-subgroup of (O/p'^O)*, and must correspond to a direct summand. 
Thus the p-rank of the quotient (0^"0)7(Z/p"Z)* is 1. The order of the factor 
(Z/ 1)Z) of (Z/p"Z)* is not divisible hyp, and must correspond to a subgroup of 
a cyclic factor of {O/p^O)* in both cases. It follows that the quotient is cyclic. The 
calculations above also show that #(e»/p"C»)7(Z/p"Z)* = - (^)). □ 

Even when kcr if is not necessarily cyclic, the size of kcr ip is as in Lemma 3.1. More 
generally, the exact sequence (3) can be used to derive the formula 

(5) h{0) = h{OK)uJl(l- 

p\u 

as in [20, Thm. 7.24]. 

We now describe a particular representation of ker ip when N = I is prime. In 

this case kcrp is cyclic, of order / — (^); this follows from Lemma 3.1 for I > 2, 
and from (5) for I = 2. Let O = Z[r] for some t G K that is coprime to I. There 
are exactly I + 1 index I subrings of O: the order R, and rings Si = IZ + {t + i)Z, 
for i from to ? — 1. Each O-ideal of norm I corresponds to one of the Si. The 
remaining Si are fractional invertible i?-idcals corresponding to proper ii-ideals 

(6) = IS^ = fZ + 1{t + i)Z, 

for which R = {13 G K : /3JiC Ji}. Exactly 1 + (^) of the S^ are O-ideals, leaving 
I — 1 — (^) proper J?-ideals J^. These are all non-principal and inequivalent in 
cl(i?), and each lies in keT<f, since we have JjO = lO. The invertible Jj are exactly 
the non-trivial elements of ker ip. We summarize with the following lemma. 

Lemma 3.2. If N = I is prime in the exact sequence (3), then the R-ideal IR and 
the invertible R-ideals Ji defined in (6) are representatives for keri^. In particular, 
ker if is generated by the class of an invertible R-ideal with norm P . 

This representation of ker has proven useful in other settings [15]. We use it to 
obtain the i^-isogeny cycles we need in Step 4 of Algorithm 2.1. 

We conclude this section with a theorem that allows us to construct arbitrarily 
large class groups that are generated by elements of bounded norm. 

Theorem 3.3. Let O be an order in a quadratic field of discriminant dx < —4, 
and let p \ disc(O) be an odd prime. Let V be a set of primes that do not divide 
p[Ok'-0]. For n G Z>o, let Rn denote the order Z +p'^0, and let Gn be the 
subgroup o/cI(i?„) generated by the set S'„ of classes of Rn -ideals with norms in V. 
Then if G2 — cl(i?2), we have Gn = cl(i?„) for every n € Z>o. 

Proof. For each i?„, let ipn ■ cl(i?„) — > cl(C') denote the corresponding map in 
the exact sequence (3), and let (j)n+i- cK^n+i) ^ cl(i?Ar) send [/] to [/i?„], so 
that pn+i = ''Pn ° ipn+i- These are all surjective group homomorphisms, and the 
underlying ideal maps preserve the norms of ideals prime to p[Ok ■ O]. We assume 
G2 = cl(i?2)j which implies Gn = cl(i?„) for n < 2, and proceed by induction on n. 

For each prime q Q V and every n, there are exactly 1 — (^) ideals in i?„ of 
norm q, and maps Sn+i onto Sn and Gn+i onto Gn. By the inductive hypoth- 
esis, Gn = cl(i?„), therefore G„+i intersects every coset oikei(j)n+i C keiipn+i- To 
prove Gn+i = cl(J?„+i), it suffices to show ker<^„_|_i c Gn+i- 




6 



REINIER BROKER, KRISTIN LAUTER, AND ANDREW V. SUTHERLAND 



The groups ker(^„ and ker(p„_(_i are cyclic, by Lemma 3.1, since p is odd and 
unramified. Let q;„ be a generator for ker(^„. Since #ker(/3„ is divisible by p, 
a„ cannot be a pth power in ker(p„. Expressing a„ in terms of Sn, we see that 
4'n+ii'^n) must intersect Let q;„+i lie in this intersection, and note that 

an+i € keT(fn+i- The order of a„+i must be a multiple of |a„| = #keri^„, and 
a„+i cannot be a pth power in keiipn+i- It follows that a„+i has order #keryi„+i, 
hence it generates keripn+i, proving kcr</?„+i C G„+i as desired. □ 

To see Theorem 3.3 in action, let O be the order of discriminant D = —7, let 
p = 3, and let V = {2}. The class group of the order ii„ of discriminant 3^"D 
happens to be generated by an ideal of norm 2 when n = 2, and the theorem then 
implies that this holds for all n. This allows us to construct arbitrarily large cyclic 
class groups, each generated by an ideal of norm 2. 

We remark that Theorem 3.3 may be extended to handle p = 2 ii the condition 
G2 = cl(i?2) is replaced by G3 = cl(i?3), and easily generalizes to treat families of 
orders lying in O that have 6-smooth conductors, for any constant b. 



4.1. The theory of complex multiplication (CM). As in Section 3, let O be 
an order in a quadratic field K of discriminant (Ik < —4. We fix an algebraic 

closure oi K. It follows from class field theory that there is a unique field Kq with 
the property that the Artin map induces an isomorphism 



between the Galois group of Kq/K and the ideal class group of O. The field Kq 
is called the ring class field for the order O. If O is the maximal order of K, then 
Ko is the Hilbert class field of K, the maximal totally unramified abelian extension 
of K. In general, primes dividing [Ok '■ O] ramify in the ring class field. 

The first main theorem of complex multiplication [20, Thm. 11.1] states that 



for any complex elliptic curve E with endomorphism ring O. Furthermore, the 

minimal polynomial Hq oi j(E) over K actually has coefficients in Z, and its degree 
is h{0) = 101(0)1. The polynomial Hq is known as the Hilbert class polynomial. 
If p is a prime that splits completely in the extension Ko/Q, then Hq splits into 
distinct linear factors in Fp[X]. Its roots are the j-invariants of the elliptic curves 
E/Fp with End(£;) ^ O, a set we denote EUo(Fp). Let D = disc(O). The primes 
that split completely in Ko are precisely the primes p\ D that are the norm 



of an element of O. The equation 4:p = t^ — v'^D is often called the norm equation. 

For a positive integer N, there is a unique extension Km,o of the ring class field 
Ko such that the Artin map induces an isomorphism 



4. Explicit CM theory 



G&\{Ko/K) 



c\{0) 



Ko = K{j{E)) 




G(i\{KN^o,Ko) 



{o/Noy/{±i}. 



The field K^^^q is the ray class field of conductor N for O. When O = Ok, this is 
simply the ray class field of conductor N, and for A'' = 1 we recover the ring class 
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field Ko = Ki^Q. The ring class field of the order i? = Z + NO is a subfield of 
the ray class field Kj^^q- The Galois group of Kji/Kq is isomorphic to 

{0/N0)*/{7./NZ)*, 

the kernel of the map <^ in (3). 

The second main theorem of complex multiplication [20, Thm. 11.39] states that 

Kn^O = Ko{x{E[N])), 

where x{E[N\) denotes the set of a;-coordinates of the A'^-torsion points of an elliptic 
curve E with endomorphism ring O. The Galois invariance of the Weil pairing 
E[N] X E[N] /ijv implies that the cyclotomic field Q(CAf) is contained in the 
ray class field -fCjv.o (a fact that also follows directly from class field theory). In 
particular, a prime p that splits completely in Kj^^o also splits completely in Q(Cjv), 
and is therefore congruent to 1 modulo A''. 

4.2. Primes that split completely in the ray class field. Wc arc specifically 
interested in primes p that split completely in the ray class field Ki^o, where I is 
an odd prime. For such p we can achieve the desired setting for Algorithm 2.1, as 
depicted in Figure 1. 

Theorem 4.1. Let I > 2 be prime, and let O ct Z[2],Z[C3] be an imaginary qua- 
dratic order that is maximal at I. Let R = Zi + lO be the order of index I in O. 
Let p be a prime that splits completely in the ray class field Ki q, but does not split 
completely in the ring class field for the order of index P inO. 

(1) There are exactly h{0) different Fp-isomorphism classes of elliptic curves 
E/Fp with endomorphism ring O that have E[l] C E{Fp). 

(2) There are exactly h{R) different Fp-isomorphism classes of elliptic curves 
E/Fp with endomorphism ring R that have an Fp-rational P -torsion point. 

Proof. The inchisions Kq C C Ki^o imply that both Hq and Hfi split into 
linear factors in F^. Each j-invariant in Elle)(Fp), resp. Ellii(Fp), corresponds 
to two distinct isomorphism classes over Fp, since these curves are ordinary and 
O 7^ Z[i], Z[C3]. Wc will show that exactly one of these satisfies (1), resp. (2). 

Since p splits completely in the ray class field Ki^o, we can factor p = TTpWp G O 
with TTp = 1 mod lO. Let E/Fp be an elliptic curve with endomorphism ring O 
whose Frobenius endomorphism corresponds to TTp under one of the two isomor- 
phisms End(£;) ^ O. Since TTp = 1 mod lO, we have E[l] C E{Fp). The Frobe- 
nius endomorphism of the non- isomorphic quadratic twist E/Fp corresponds to 
—TTp, and we then have #i?(Fp) = p+1 — tr(— TTp) = 2 mod I. For I ^ 2 this implies 
that E has trivial /-torsion over Fp. proving (1). 

To prove (2), let E' /Fp be a curve with endomorphism ring R that is /-isogenous 
to E. The Frobenius endomorphism of E' also corresponds to TTp under an isomor- 
phism End(ii") R. The cardinality of _B'(Fp) is thus equal to the cardinality 
of E{Fp) and therefore divisible by . However, since p docs not split completely 
in the ring class field of index P in O, we cannot have TTp = 1 mod IR. It follows 
that E'[l] (f. E'{Fp) and E'{Fp) must contain a point of order P. As above, the 
quadratic twist of E' must have trivial /-torsion over Fp, proving (2). □ 

Provided the order O in Theorem 4.1 also satisfies h{0) > 1 + 2, we can achieve 
the desired setting for Algorithm 2.1. We say such an order is suitable for I. 
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To determine the coefficients <i>; via the Chinese Remainder Theorem, we need to 
compute $; mod p for many primes p satisfying Theorem 4.1. We necessarily have 
p> I, since p=l mod I, and the height bound [19] on the coefficients of implies 
that 61 + 0(1) primes suffice. We now show these primes exist and bound their 
size, assuming the GRH. For this purpose we define a suitable family of orders. 

Definition 4.2. Let S be the set of odd primes and let T be the set of all imaginary 
quadratic orders. A suitable family of orders is a function T : S ^ T such that: 

(1) for alll £ S the order T{1) is suitable for I. 

(2) there exist effective constants c\,C2 G R>o such that for alll G S the bounds 
l + 2< h{F{l)) < cil and P < \ disc(7'(Z))| < csZ^ hold. 

Example 4.3. Let J"(3) = Z[\/=47], and for I > 3 let T{1) be the order O of 
discriminant —7-3^", where n is the least integer for which h(0) = 2-3"^^ >l + 2. 
Letting Ci = 4 and c^ = 205, we see that T is a suitable family of orders. 

Theorem 4.4. Let be a suitable family of orders and let cq S R>o be an arbitrary 
constant. Then for each prime I > 2 the set of primes p for which I, O = J^{1) 
and p satisfy the conditions of Theorem 4-1 has positive density. 

Assuming the GRH, there is an effective constant c € R>o such that at least 
coZ^(logZ)^ of these primes are bounded by B = d^(logZ)^, for all primes I > 7. 

Proof. For a prime Z > 2, let O = have fraction field K, and let u = [Ok '■ O]. 

The ray class field Ki_o and the ring class field Ks for the order 5 = Z + PO 
are both invariant under the action of complex conjugation, hence both are Galois 
extensions of Q. One finds that 



#G&l{Ki,o/Q)=2{l-l){l-{^))h(0) < 2l{l - {^))h{0) = #G&\{Ks/Q), 



and the Chcbotarcv density theorem [47, Thm. 13.4] yields the unconditional claim. 

To prove the conditional claim, we apply an effective Chebotarev bound to the 
extension Ki^o/Q, assuming the GRH for the Dedekind zeta function of Ki^o- 

The extension Ki o/K is abelian of conductor dividing lu, with degree nh{0), 
where n < 2#{0/lO)* < 2f. The Ox-ideal disc{Ki^o/0) is a divisor of (Zm)"'*(^), 
by Hasse's Fiihrerdiskriminantenproduktformel [47, Thm. VII. 11. 9]. We then have 



where disc(C') < C2P. Using the bound h{0) < cil, Theorem 1.1 of [41] then yields 



where 7r{x, Ki^o/Q) counts the primes up to a; e R>o that split completely in Ki^o, 
and C3 G R>o is an effectively computable constant, independent of I. 

If we now suppose x = c/^(logZ)^, and apply Li{x) ~ xlogx and nh{0) < cil^, 
we may choose c e R>o so that Li{x) / {2nh{0)) is greater than the RHS of (7) 
by an arbitrarily large constant factor. In particular, for any C4 € i?>o there 
is an effectively computable choice of c that ensures Tr{x,Ki^o/Q) > Cil^{\ogl)^, 
independent of I. Moreover, for the least such c we have c/c4 — )• 1 as C4 00. 



|disc(i^,,o/Q)| = |iVx/Q(disc(if;,o/i^)) •disc(/^/Q)[^'.^^^]| 
< (Z/)2"''(0)|disc(i<:/Q)|"'^('^) < {C2i^r''^^\ 
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We now show that most of these primes do not spht completely in Ks- Any 
prime p that splits completely in Ki q must split completely in the ring class field 
for i? = Z + lO. Putting D = disc(C'), we then have 

(8) Ap = t^- v'^fD, 

with t,v € Z>o and t = 2 mod /. If v ^ mod /, then p cannot split completely 
in Ks- For p < cf{logl)'^, we have v < 2c^^'^l{logl)^ and t < 2c^/'^l^{loglf, since 
D > hence there are at most 2c^/^(log/)^ positive v = mod Z, and at most 
2c^/'^l'^{\ogl f + 1 positive t = 2 mod I, that satisfy (8). 

It follows that no more than 4c/^(logZ)'' + 2c^/^(logZ)^ primes p < d®(log?)^ 
split completely in Ks- For a sufficiently large choice of C4, we can choose c so that 
Tr{x,Ki^o/Cl) > C4p(logl)'^ and at the same time ensure that 

C4Z3(log/)3 - 4cP(\oglf - 2cy^{loglf > co^^(log^)^ 
for all primes / > 7, since wc then have Z/(log/) boimded above 4. □ 

Theorem 4.4 guarantees we can obtain a sufficient number of primes p for use 
with Algorithm 2.1. In fact, as is typical for such bounds, it provides far more than 
we need. The task of finding these primes is addressed in Section 6.1. 

4.3. Computing the CM action. The Galois action of Gal{Ko/K) = cl{0) on 
the set Ellci(Fp) may be explicitly computed using isogenics, as described in [2]. 
Let the prime p split completely in the ring class field Kq, and let E/Fp be an 
elliptic curve with Fjnd{E) = O. Fixing an isomorphism End(£^) O, for each 
invertible O-ideal a we define 

E[a] ={Pg E(Fp) I Vr e a : r(P) = 0}, 

the 'a-torsion' subgroup of E. The subgroup E[a] is the kernel of a separable isogeny 
E E/E[a] of degree [0:a], with End(^/E[a]) ^ O. This yields a group action 

j{Er = j{E/E[a]), 

in which the ideal group of O acts on the set Ellc)(Fp). This action factors through 
the class group, and the cl(C')-action is transitive and free. Equivalently, Ello(Fj,) 
is a torsor for d{0); for each pair (71,^2) of elements in EUq there is a unique 
element of cl(0) whose action sends ji to 72- 

Now let [q be an invertible O-ideal of prime norm Iq ^ p. The curves E and 
E/E[Iq\ are Zo-isogenous, hence 

^;o(jo,Jo°) = 0, 

where jo = j{E). To compute the action of k), we need to find the corresponding 
root of ^ig{X,jo) e Fp[-^]- We assume that ^(^(X, F) is known, either via one of 
the algorithms from the introduction, or by a previous application of Algorithm 6.1. 
The polynomial jo) G Fp[X] has either 1 or 2 roots that he in Ellci(Fp), 

depending on whether Zq ramifies or splits (it is not inert). These roots correspond 
to the actions of [q and its inverse Iq, which coincide when Iq ramifies. 

Our fixed isomorphism End(£') O maps the Frobenius endomorphism of E 
to an element tt^ € O C Ok with norm p. We then have the norm equation 

(9) Ap^t^ - v^dK, 

where t = tr(7rp), and v is the index of Z[7rp] in Ok- When Iq does not divide v, 
the order Z[7rp] is maximal at Zq arid the only roots of ^(^(Xjjo) over Fp are those 
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in E\\o{Fp). Otherwise 'Pig{X,jo) has I + 1 roots in Fp, and those in EUc)(Fp) he 
on the surface of the Zg-volcano containing j, as described in [28]. The roots on the 
surface can be readily distinguished, as in [53, §4], for example, but typically we 
choose p with Iq] v so that every root of jo) in Fp is on the surface. 

When Iq splits and does not divide v, the actions of [q and Iq-'^ may be distinguished 
as described in [11, §5] and [29, §3]. The kernels of the two lo-isogenies are subgroups 
of E[Iq]. a standard component of the SEA algorithm computes a polynomial 
Fi^^{X), whose roots are the abscissa of the points in one of these kernels [21, 49]. 
In our setting Iq splits in Z[7rj,], and provided Iq f v, the action of np on E[lo] has two 
distinct eigenvalues corresponding to the two kernels. Expressing the ideal Iq in the 
form {lo,c + d-Kp) yields the eigenvalue A = —c/d mod ■ Wc may then use Fi^ {X) 
to test whether TTp's action is equivalent to multiplication by A in the corresponding 
kernel. See [11] for an example and further details. 

As a practical optimization (see Section 6.7), we avoid the need to ever make 
this distinction. The asymptotic complexity of computing the action of Iq is the 
same in any case. 

Lemma 4.5. Let Iq and p be distinct odd primes, and let O ^ Z[i],Z[C3] be 
an imaginary quadratic order. Let jo — j{E) € Ellci(Fp), fix an isomorphism 
End(ii') O, and let TTp G O denote the image of the Frobeni/us end,om,orphism. 
Let lo he an invertible O-ideal of norm Iq, and assume Z[7rp] is maximal at lo- 

Given e Fp[X, F], the j-invariant may be computed using an expected 
0{Iq + M(Zo) logp) operations in Fp. 

Here M (n) denotes the complexity of multiplying two polynomials of degree less 
than n, as in [55, Def. 8.26]. Naively, M(n) = O(n^), Karatsuba's algorithm yields 
M(n) = 0(n'°^2^), and FFT-based methods achieve M(n) = O(nlognloglogn). 

Proof. We first compute gcd{XP — X, $/„ {X, jo)), the product of the distinct linear 
factors of jg) over Fp. Instantiating f{X) ~ <i>;„(X, jo) uses 0{Iq) operations 

in Fp, exponentiating X^ mod / uses 0(M(^o) logp) operations in Fp, and the fast 
Euclidean algorithm [55, §11.1] obtains gcd(XP-X, /) using 0{M{lo) log Iq) = 0(Z§) 
operations in Fp. This gcd has degree at most 2, since Z[7rp] is maximal to Iq, and 
we may find its roots using an expected O(logp) Fp-operations [55, Cor. 14.16]. 

The desired root jg" is then distinguished as outlined above. We first compute 
the eigenvalue A — c/d mod Iq, where lo — (k),c + diTp), using 0{Iq) bit opera- 
tions. Applying [9, Thm. 2.1], the kernel polynomial _F;„(A') can be computed 
using 0(M(^o)) operations in Fp. To compare {XP,YP) to the scalar multiple 
A • {X,Y), wc compute X^, Y^, and the required division polynomials ^ljn{X,Y), 
modulo Fig{X) and the curve equation for E, as in the SEA algorithm [7, Ch. VII]. 
This uses O((log/o + ^ogp)M{lo)) = 0{ll + M{lo) logp) operations in Fp. □ 

5. Mapping the CM torsor 

The previous section made explicit the Galois action corresponding to an element 
of cl(C') = Gal{Ko/K) represented by an ideal Iq of prime norm Iq. We now use 
this to enumerate the set Ello(Fp), and at the same time compute a map that 
explicitly identifies the action of each element of cl(C'). To do this efficiently it is 
critical to work with generators whose norms are small, since the cost of computing 
the action of increases quadratically with its norm. 
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5.1. Polycyclic presentations. As a finite abelian group, each element of cl{0) 
can be uniquely represented using a basis. However, as noted in [53, §5.3], the norms 
arising in a basis may need to be much larger than those in a set of generators. 
Thus we are led to consider polycyclic presentations. 

Lot a = («!, . . . , afe) be a sequence of generators for G, and let Gi = {ai, . . . , a,) 
denote the subgroup generated by ai, . . . , a^. The composition series 

1 = Gq < Gi < ■ ■ ■ < Gk-i < Gk ~ G, 

is then polycyclic, meaning that each quotient Gi+x/Gi is a cyclic group. The 
sequence r(a) = (ri, . . . ,rfe) of relative orders for a is defined by 

ri = \Gi : Gi-i\. 

Each Tj necessarily divides \ai\, and for i > 1 we typically have rj < \ai\. The 
sequences a and r(a) allow us to uniquely represent each /3 € G in the form 

(10) /3 = a"-a^l•••a^^ 

where x = {xi, . . . ,Xk) with < a;, < r,. The vector s{a,i) = x for which 
= a*" has xj = for j > i, and is called a power relation, see [36, §8.1]. A 
generic algorithm to compute r{a) and the s{cx,i) can be found in [53, Alg. 2.1]. 
The vector x in (10) is the discrete logarithm or exponent vector of /3. We let 

X{a) = {a; e Z'' : < Xi < r^}, 

and note that the map x cx'^ defines a bijcction from X{a) to G. 

We now consider the case G = cl(C'), where O is an order in a quadratic field 
K of discriminant (Ik < —4. Let V = {pi,P2,P3, • • •) be an increasing sequence of 
primes with the property that c\{0) is generated by the classes of invertible ideals 
with norms in V. By Dirichlet's density theorem [20, Thm. 9.12], any sequence 
containing all but a finite set of primes works, and from [20, Cor. 7.17] we know 
that some finite prefix of V actually suffices. For O = Ok we may take V to be 
the sequence of primes less than |dif/3|^/^, by [13, Prop. 9.5.2]. 

There is a unique lexicographically minimal subsequence (Zi, . . . , Zfc) of 'P that 
corresponds to a polycyclic sequence a = (ai, . . . , a^.) for 01(0) in which ai is 
represented by an ideal of norm /j, and ria) has rj > 1. When li splits there are 
two possibilities for aj. To fix a choice, let be the ideal class represented by the 
unique binary quadratic form ax^ + hxy + cip' of discriminant D = disc(C') with 
a = li and b nonnegative [13, §3.4], corresponding to the ideal = {k, (— 6+-\/I?)/2). 

We call a the polycyclic presentation of c\{0) determined by V. We use Z(a) 
to denote the sequence of norms {li, . . . ,1^), but note that this also depends on V; 
each li is the least prime in V that is the norm of an ideal in ctj. 

We may compute a by applying [53, Alg. 2.1] to an implicit sequence of gen- 
erators 7 = (71,72,73, • . •) corresponding to the subsequence of V for which there 
exists an invertible O-ideal of norm pi. The algorithm computes rj for each 7^ in 
turn, and if we find that Tj > 1, we append 7^ to an initially empty vector a. We 
terminate when 0''* = h{0), a value which we assume has been precomputed. 

The computation of a uses |G| — h{0) group operations in G = c\{0), and 
creates a table T : X{oi) — )• G that stores |G| = h{0) group elements [53, Prop. 6]. 
Using binary quadratic forms to represent 01(0), the group operation has bit- 
complexity 0(log^ I disc(O)l), as shown in [5], and each element may be stored in 
0(log I disc(O)l) space. Evaluating T(x), or T-i(^), has bit-complexity 0(log |G|). 
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5.2. Suitable presentations. When V is the sequence of all primes, the norms 
l{a) for thc> polycyclic presentation a of 01(0) determined by V are as small as 
possible. However, when working in a finite field Fp, we may wish to ensure that 
each norm U does not divide v = [Ok ■ Z[7rp]], as noted in Section 4.3. This is 
achieved by excluding from V primes that divide v, and we call the corresponding 
a the presentation of cl(C) suitable for p. This may cause us to use norms that are 
slightly larger than optimal. We now show that, provided we work with a family 
of orders that satisfies certain (easily met) constraints, the norms in every suitable 
presentation are quite small, assuming the GRH. 

Theorem 5.1. Let C3 G R>o be a fixed constant, and let T be a suitable family of 
orders with the following additional property: if O is an order in T whose fraction 
field K has discriminant dx, and so denotes the square-free part of [Ok '■ O], then 
So is coprime to 2dk and both so and are bounded by C3. 

Then under the GRH, for every O in and every prime p that splits completely 
in Ko, the presentation a of d{0) suitable for p has norms l{a) for which 

maxZ(a) < clo{v) \og{uj{v) + 1), 

where v is defined by 4p — t^ — f^disc(C'), the function lu{v) counts the distinct 
prime factors of v, and c is an effective constant that depends only on C3. 

Proof. Let O, p, v, and a be as above. Let R be the order of index Sq in Ok, 
and let 7 be the presentation of R determined by the increasing sequence of primes 
that do not divide v. It follows from Theorem 3.3 that maxl{a.) < max^(7). 

Let Kn be the ring class field for R and let ttc{x, Kji/Q) count the primes 
bounded by x e R>o whose Probenius symbol (under the Artin map) lies in the 
conjugacy class C of Gal{KR/Q). Wc may bound [X/j iQ], Gal(i^/j/Q), and 
disc{K]i) by constants that depend only on C3, independent of O. Under the GRH, 
the Chebotarev bound of [41, Thm. 1.1] then yields 

Trc{x,Kii/Q) > C4x/logx, 

for some effective constant C4 e R>o and all x > 2, where C4 depends only on C3. 
For an effective constant c depending on C3, setting x = cuj{v) log(w(w) + 1) yields 
TTcix, Kji/Q) > w(i'). In this case the Frobenius symbol of at least one prime not 
dividing v lies in C, and this applies to every C. It follows that every class in c\{R) 
contains an element whose norm is a prime bounded by x that does not divide v. 
We then have max^(a) < max/(7) < x ~ cuj(v) \og{uj{v) + 1), as desired. □ 

The family of orders in Example 4.3 satisfies the requirements of Theorem 5.1. 
We note that provided logp = O(log^), we have lj{v) = 0(log^/loglog^), and the 
theorem then yields an O(logZ) bound on the norms 1(a). This is sharper than 
the more general 0{\og^ I) bound implied by [1]. In fact, by [32, Thm. 431], one 
expects oj{v) = O(loglogp), which yields a bound of 0(loglogZlogloglog/). 

5.3. Realizing the CM torsor. We now consider how to explicitly map c\{0) to 

the torsor Ellc)(Fp), so that we may then compute the action of any clement (or 
subgroup) of cl(C') on any element of Ellc)(Fp), without needing to compute any 
further isogenics. We use the presentation a of c\{0) suitable for p, and the table 
T : X{<y.) —>■ c\{0) described in Section 5.1. As above, we have a = ([[1], . . . , [Ik]), 
with norms l{a) = {li, . . . ,1^) and relative orders r(a) = (ri, . . . , r^). We assume 
the modular polynomials , . . . , are known, since the Zj are small. 
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To enumerate Ellc)(Fp) wc use [53, Alg. 1.3], but wc augment this algorithm 
to also compute an cxphcit bijcction 0: d{0) — >■ Ello(Fp) in which [a] G cl{0) 
corresponds to Jq. Given jo e Ellci(Fp), wc compute a path of Zfc-isogenies 

(11) Jo > Jl > J2 > ■■■ > Jvk-l-, 

where the ji are distinct elements of Elle)(Fp). As explained in Section 4.3, each step 
in this path is computed by finding a root jj of that lies in Elle)(Fp). 

When Ik splits in K wc have two choices for ji, and the correct choice may be 
determined using a kernel polynomial as outlined in Section 4.3. For i > 1 we use 
the polynomial — ji_2), which has exactly one root ji e Elle)(Fp). 

When fc > 1, the enumeration of ElIc)(Fp) proceeds recursively. For each ji 
in (11) we compute a path of Ik-i isogenics containing rfe_i distinct j-invariants. 
Eventually, every element of Elle)(Fp) is enumerated exactly once [53, Prop. 5]. 
For each j„ G Ellc)(Fp) we also compute a vector x G X{oi.) that describes the 
path used to reach j„ from jo, where Xi indicates the number of steps taken on an 
Z,-isogeny path. By correctly choosing the direction of each path, we ensure that 
each in is the image of jo under the cl(0)-action of = a^^ ■ ■ ■ a^* . This yields 
the desired bijection </>; since uniquely represents some (3 G cl(C'), we may set 
(j){a'") = jn, a process facilitated by the map T : X{a) d{0). 

The bijection (j) allows us translate any computation in the group c\{0) to the 
torsor Ello(Fp). In particular, by enumerating the cyclic subgroup H C cl(C') 
generated by [I], where [ is an ideal of norm I, we obtain the Z-isogeny cycle con- 
taining Jo, corresponding to the surface of one of the ^-volcanoes in Figure 1. Doing 
the same for each coset of H partitions El\o{Fp) into l-isogeny cycles. 

This may also be applied to the order i? = Z + lO. After obtaining a bijection 
from cl(i?) to Elli^(Fp), we enumerate the kernel of the map ip : c\{R) — > cl(0) from 
the exact sequence of (3) . Here we use one of the generators of norm P guaranteed 
by Lemma 3.2. Enumerating the cosets of ker<^ then partitions Elli{(Fp) into P- 
isogeny cycles of siblings with a common Z-isogenous parent in Ello(Fp). 

6. The algorithm 

Wc now prcscnit our algorithm to compute the modular polynomial <i>i using the 
Chinese Remainder Theorem (CRT). Algorithm 6.1 follows the standard pattern 
of a CRT-based algorithm; the details lie in Algorithm 6.2, which selects a set of 
primes S, and in Algorithm 2.1, which computes $j modulo each prime p G S. 
The computation of G Z[X, Y] may be viewed as a special case of computing 
G {Z/mZ)[X, Y], where m is the product of the primes in S. The choice of S 
ensures that this m is large enough to uniquely determine G Z[X, Y]. 

Algorithm 6.1. Let I be an odd prime, let to be a positive integer, and let O = J^{1) 
lie in a suitable family of orders J^. Compute G {Z/mZ)[X, Y] as follows: 

1. Compute the Hilbert class polynomial Hq G Z[X]. 

2. Select a set of primes S with Algorithm 6.2, using I and O. 

3. Perform CRT precomputation using S. 

4. For each prime p G S: 

a. Compute modp with Algorithm 2.1, using O and Hq. 

b. Update CRT data using mod p. 

5. Perform CRT postcomputation. 

6. Output G {Z/mZ)[X,Y]. 
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The suitable family of orders T is as defined in Section 4.2, sec Definition 4.2. 
The polynomial Hq computed in Step f may be obtained using any of several 
algorithms whose running time is quasi-linear in disc(C'), including [f f , 22, 53]. 

6.1. Selecting primes. The primes p in the set 5 selected by Algorithm 6.f must 
satisfy the conditions of Theorem 4.1 in order to use them in Algorithm 2.1. We 
require p to split completely in the ray class field Ki,q, but to not split completely 
in the ring class field for the order Z + Equivalently, we need p\ D to satisfy 

(12) 4:p = f~v^fD, 

with t = 2 mod I and l\ v, where D ~ disc(C'). To apply the CRT, we also require 

(13) l[\ogp>A\c\, 

pes 

for every coefBcient c of e Z[X, y]. Prom [12], we use the explicit bound 

(14) Bi = eilogl + 171, 

on the logarithmic height of to achieve this. We then have = 0(1), by (12). 

Heuristically, it is easy to find primes that satisfy (12). If D = 1 mod 8,Rxv = 2, 
otherwise fix t; = 1. Then, for increasing t = 2 mod I with the correct parity, test 
whether p = {f^ — v'^PD)/4 is prime. We expect to need O(nogZ) primality tests, 
and each can be accomplished in time polynomial in log I, although typically p is 
small enough to make an attempted factorization more efficient. We could obtain 
slightly smaller p's by letting v vary, but it is more convenient to fix v so that we 
can use the same presentation of cl(C') and cl(i?) for every p. This approach is easy 
to implement and very fast in practice. 

However, in order to prove Theorem 1 we must take a more cautious approach. 

Even assuming the GRH, we cannot guarantee wc will find any primes with a 
fixed value of v. On the other hand. Theorem 4.4 implies that if we construct 
random integers p < x satisfying (12). for sufficiently large x we have p prime 
with probability 0(1/ log 3;), and under the GRH, x = 0(/^(logZ)^) is large enough. 
Additionally, we would like to avoid w's with many prime factors, so that we may 
more profitably apply Theorem 5.1. By Lemma 8.1 of the appendix, the restriction 
w(i') < (log(logw + 3))^ eliminates a negligible proportion of the integers v G [1, .x]. 

We now present Algorithm 6.2, emphasizing that its purpose is to facilitate the 
proof of Theorem 1. In practice we use the heuristic procedure described above. 

Algorithm 6.2. Let I be an odd prime, let f < — 4 be a discriminant, and let Bi 

be as in (14). Construct the set S as follows: 

1. Set n ^ [El + 21og2)/log(Z2|D|/4) and then x ^ ^f\D\nlogn. 
Set 6 ^ and 5^0. 

2. Set T ^ 2xV2 and V ^ 2a;V2/-i|£)|-i/2. 

3. Repeat [2iVlog.T] times: 

a. Construct an integer p = {t^ — v^PD) / 4 using uniformly random integers 
V G [1, V] and t G [1, T], subject to I ] v, t = 2 mod I, and t = vD mod 2. 

b. If u}{v) > (log(log V + 3))2 then go to Step 3d. 

c. lip ^ S and p is prime then set S* 5 U {p} and 6 6 + logp. 

d. If 6 > B; + 2 log 2 then output S and terminate. 

4. Set X ^2x and go to Step 2. 
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In Step 3a, the integer t is generated an t = al + 2 using a uniformly random 
integer a G [0,V/l — 2]. The computation of u}{v) in Step 3b is performed by 
factoring v. Note that #5* < n, and p < x for all p & S. 

Lemma 6.3. Let T be a suitable family of orders, let I be an odd prime and let 
D = disc(J^(/)). Given inputs I and D, the expected running time of Algorithm 6.2 
is finite. Under the GRH we also have the following: 

(1) The expected running time is 0{l^'^'^), for any e G R>o- 

(2) There is a constant c < 1 such that for all I > 7 and k G Z>o, the algorithm 
terminates with log a; < (6 + fc) log/ with probability at least 1 — c~'^'°si_ 

Proof. To analyze Algorithm 6.2, we count the number of times Step 4 is executed, 

referring to the period between each execution as an iteration. By Theorem 4.4, 
the set of primes that satisfy Theorem 4.1, equivalently, those that satisfy (12), has 
positive density. Here we may use the natural density, via [38, Thm. 4.3.e]. For 
every fixed odd prime I, this implies a lower bound of f2(l/logx) on the probability 
that a random integer in [l,a;] is a prime that satisfies (12). Each integer p tested 
by Algorithm 6.2 necessarily satisfies (12), hence such ap is prime with probability 
$1(1/ log .t). By Lemma 8.1 in the appendix, the probability that a candidate p is 
skipped due to the test in Step 3b is o(l/loga;). This implies that for all suflS- 
ciently large x, the probability that Algorithm 6.2 terminates in a given iteration 
is bounded above zero, and the expected running time is finite. 

Now assume the GRH and let I > 7. Applying Theorem 4.4 with cq = 1, there 
are at least Z^(logZ)^ primes p < ciZ^(logZ)^ that satisfy (12), for some constant 
Ci G R>o that does not depend on I. Let xq be the least value of x > ciZ^(log ^)''). 
When X = XQwe have VT/l < 8ci/^(log^)^, since \D\ > l"^, and the probability that 
a given primality test succeeds is at least 8ci/logi > C2/loga;, for some constant 
C2 G R>o. From the inequality (7) in the proof of Theorem 4.4, one finds that this 
holds for all x > xq, with the same constants. As above. Step 3b has negligible 
impact, and for x > xq the probability that the algorithm terminates in a given 
iteration is at least c, for some constant c G R>o independent of I and x. 

We now consider the running time as a function of I, fixing an arbitrary e G R>o- 
It takes O(logZ) iterations to achieve x = xq, assuming that we don't terminate 
earlier, and we execute Steps 3b and 3c a total of 0(Z(logZ)^) times during this 
process. The computation of ui(v) and the primality test of p can both be achieved 
in expected time subexponential in log a;, by [44], yielding an 0(Z^+^) bound on the 
time to reach x = xq, since logxo = 0(log(Z)). 

For X > Xq, the probability of reaching each subsequent iteration declines expo- 
nentially, while the cost of Steps 3b and 3c grows subexponentially, implying that 
the total expected running time is also 0(1^^^), proving (1). 

Claim (2) follows from the same analysis. We have logcco = (6 + o(l)) log^ and 
add log 2 to log x in each iteration. Once a; = a;o, it takes more than k log / iterations 
to reach log x > (6 + fc) log I. There is a probability of at least c that the algorithm 
terminates in each subsequent iteration, yielding the bound in (2). □ 

6.2. CRT computations. The computations involved in Steps 3, 4b, and 5 of 
Algorithm 6.1 are described in detail in [53, §6]. We summarize briefly here. 

Given S = {pj, let M = Y[Pi, Mi = M/pi, and a, = M~'^ modp,. Let c 
denote a coeSicient of G Z[X, Y], and let Cj = c mod Pi denote the corresponding 
coeflacient of G Fp. [X, Y]. As in [55, §10.3], we can use fast Chinese remaindering 
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to cfBciently compute 

(15) c = CittiMi mod M. 

Provided that M > 2|c|, we can then Hft the result from Z/MZ to Z. 

When m is "large," by wliich wc moan logm = O(logM), we compute c mod M, 
hft to Z, and then reduce to TijmJL. In this scenario, Step 4b simply stores the 
coefficients Cj and Step 3 can be deferred to Step 5. 

When m is "small," by which we mean M(logm) = 0(log^ Hog log Z), we instead 
use the explicit CRT modulo m. Assuming M > 4|c|, we may apply 

(16) c = CiGiMi — rM mod m, 

where r is the closest integer to s = J2^i^i/Pi^ by [4, Thm. 3.1]. In this scenario, 
we update the sum C = ^CiaiMi mod m and an approximation to s in Step 4b as 
each Ci is computed. This uses 0(logm + log^ space per coefficient, rather than 
the 0{l\ogl) space used to compute cmodM. The postcomputation in Step 5 
determines r from the approximation to s and computes c mod m via (16). 
When m is neither small nor large, a hybrid approach is used, see [53, §6.3]. 

6.3. Computing $/(X, Y) mod p. An overview of Algorithm 2.1 was given in the 

introduction, we now fill in the details. 

Algorithm 2.1. Let I, p, and O be as in Theorem 4-1, with h{0) >l + 2, and let 
R = Z + lO. Given Hq e Z[X], compute e Fp[X, Y] as follows: 

1. Compute the presentations a of cl{0) and a' of cl(i?) suitable for p. 

2. Find a root of jo of Ha{X) over Fp. 

3. Use a to enumerate Ellc)(Fp) from jo, and identify the Z-isogeny cycles. 

4. For distinct jo,..- ,ji+i S Ello(Fp): 

a. Construct a curve Ei with j{Ei) = ji such that I divides #£^j(Fp). 

b. Generate a random point P € Ei{Fp) of order I. 

c. Use Ei and P to compute an Msogenous curve E'^/Fp via Algorithm 6.4. 

d. If i(S|) ^ Ello(Fp) set j'i <(- {E^), otherwise return to Step 3b. 

5. Use a' to enumerate Ellfl(Fp) from j^ and identify the Z^-isogeny cycles. 

6. For i from to Z + 1: 

a. Let jio,---,jii consist of the neighbors of ji in its Z-isogeny cycle in 
Ellc)(Fp) together with the Z^-isogeny cycle of Ellfl(Fp) containing j'^. 

b. Compute ^i{X,ji) = J2k o-ik^'^ as the product nfe(^ " jik)- 

7. For k from to / + 1: 

a. Interpolate 4>k € Fp[y] with deg^j, < Z + 1 satisfying 4>k{ji) = <ij/s- 

8. Output ^i{X,Y) = Y.k<t>k{Y)X^. 

Steps 2, 6, and 7 involve standard computations with polynomials over finite fields, 
as described in [55], for example. Step 1 is addressed in Section 5.1, and Steps 3 
and 5 are the topic of Section 5.3. Only Step 4 merits further discussion here. 

The existence of the curve Ei constructed in Step 4a is guaranteed by The- 
orem 4.1. The trace of Frobenius t of the desired curve is uniquely determined 
by the norm equation for p and the constraint t = 2 mod Z, as in (12). With 
k = ji/{1728 - jt), the curve E/Fp defined by y'^ ^ + ikx + 2k has j{E) = ji, 
and we may determine whether it is E or its quadratic twist that has trace t by 
attempting to generate a point of order Z on both curves in Step 4b. 



MODULAR POLYNOMIALS VIA ISOGENY VOLCANOES 



17 



To obtain a point P of order /, wc generate a random point Q uniformly dis- 
tributed over Ei{Fp), compute the scalar multiple P = nQ, where n = {p+l — t)/l, 
and then check that P ^ 0. This will be true with probability 1 — l/P, since 
Theorem 4.1 implies that the Sylow Z-subgroup of Ei{Fp) is Ei[l] ^ Z//Z x Z//Z. 
Thus we expect to succeed within 1 + 0{1/P) attempts, and the expected cost of 
generating P is O(logp) operations in Fp. 

Note that Ei{Fp) contains l + l distinct subgroups of order /, each corresponding 
to a distinct Wsogenous j-invariant. At most 2 of these lie in Ellc)(Fp). Thus in 
Step 4d we have i(-E-) ^ Elle)(Fp) with probability at least 1 — 2/(Z + l) and expect 
to need 1 + 0{l/l) random points P to obtain such an E'^. The curve E'^ is the 
image of an /-isogeny whose kernel is generated by P, obtained via Algorithm 6.4. 

6.4. Isogenies from subgroups. Let E/Fp be an elliptic curve. Given a cyclic 

subgroup H C E{Fp), Vein's formulas construct an isogeny E ^ E' with H as its 
kernel [54]. Typically H is specified by a polynomial whose roots in the algebraic 
closure Fp are the x-coordinates of the points in H . In our setting we may specify 
H as the subgroup generated by a point P S i?(Fp), and work entirely in Fp rather 
than an extension field. Additionally, the order Z of if is odd and p > 3, allowing 
us to simplify the formulas. 

Algorithm 6.4. Let / > 2 and p > 3 be primes, let E/Fp be an elliptic curve 
defined by y'^ = + Ax + B, and let P = {Px, Py) be a point on E{Fp) of order I. 
Compute the image E' /Fp of the Z-isogeny with kernel H = (P) as follows: 

1. Set t ^ 0, w ^ 0, and Q <s- P 

2. Repeat {I — l)/2 times: 

a. Set s 6Ql + 2 A and then set u 4(5^ + sQx- 

b. Set t ^ t + s, w ^ w + u, and Q Q + P. 

3. Set A' = A-5t and B' = B- 7w. 

4. Output the curve E'/Fp defined by y'^ ^ + A'x + B' . 

The addition Q + P in Step 2b is performed using the group operation in EiFp). 
The complexity of Algorithm 6.4 is 0{l) operations in Fp. 

6.5. Complexity analysis. We first bound the complexity of Algorithm 2.1, as 
used by Algorithm 6.1. 

Lemma 6.5. Let T he a suitable family of orders that satisfies the conditions of 
Theorem 5.1. For an odd prime I, let O = and let D = disc(C'). Let p be a 

prime in the set S selected by Algorithm 6.2 on input I and D. Assuming the GRH, 
the expected running time of Algorithm 2.1 is 0(Z^(logp)^ loglogp). 

Proof. We note that p = t^ — v'^PD > Z^, thus logZ < logp, and recall that the bit- 
complexity of multiplying two polynomials of degree 0{l) in Fp[Ar] may by bounded 
by 0{M{l\ogp)), using Kronecker substitution, see Corollaries 8.28 and 9.8 of [55]. 

In the analysis below we use 0{M{llogp)) = 0(/(logp)^ loglogp), via the bound 
M(n) = 0(n log n log log n) for fast multiplication [48]. To bound the cost of oper- 
ations in Fp, we use M(n) = 0(n^/(log n)'^), see Remark 6.6, and bound the cost 
of inversions by 0(M(logp) loglogp) = 0((logp)^), via [55, Cor. 11.10]. 

We now bound the (expected) cost of each step in Algorithm 2.1: 
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1. We have h{0) < h{R) = 0{P). As described in Section 5.1, the cost of com- 
puting the presentations ot and a' is O(Z^) operations in cl(C') and cl(i?). Us- 
ing binary quadratic forms, each group operation has complexity 0((logZ)^), 
by [5], yielding an 0{f{\oglf) = 0{P{\ogpf ) bound on Step 1. 

2. Using Bcrlckamp's probabilistic root-finding algorithm [3, §7] with a fast 
GCD computation [55, Alg. 11.4], the expected time to find a single root 
of Hjj G Fp[X] may be bounded by 0(M(/)(log/ + logp)) operations in Fp. 
This implies an 0{l{\ogpf') bound on Step 2. 

3. For p & S we have uj{v) < (loglogp)^, yielding an 0((loglogp)^logloglogp) 
bound on max 1(a). From Lemma 4.5, 0((loglogp)^ logp) operations in Fp 
suffice to compute the action of any element of a. We obtain an 0{l{\ogp)^) 
bound on the time to enumerate Elle)(Fp), which dominates the O(ZlogZ) 
time to identify the Wsogeny cycles. 

4. From the discussion in Section 6.3 and the complexity of Algorithm 6.4, we 
expect to use 0{P + llogp) operations in Fp during Step 4. This yields an 
0(;2(logp)2 -H;(logp)3) bound. 

5. Recall that the surjective map (p : c\{R) — )■ cl(0) in (3) preserves the norms of 
representative ideals. The subgroup of cl(i?) generated by invertible iJ-ideals 
with norms in l{a.) contains (p~^{c\{0)). It follows that the elements of a' 
with norm at most maxZ(a) generate a subgroup of size at least h{0) > I. 
All but 0{l) of the 0{P) steps taken when enumerating Ellii:(Fp) involve 
these elements, and, as in Step 3, we obtain a cost of 0{P{\ogpY) for these 
steps. Assuming the GRH, the remaining elements of ex' all have norm 
0((log|D|)2) = 0((logZ)2), via [1], yielding a cost of 0{l{\oglY{\ogpf) for 
these steps. Thus the expected time to enumerate Ell7^(Fp) is 0{P{\ogpY), 
which dominates the 0(Z^log/) time to identify the /^-isogeny cycles. 

6. Using a product tree we may compute Y\k{X—jik) in time 0(M(Z logp) log I), 
yielding a total cost of 0{Pi\ogpY loglogp) for Step 6. 

7. Using a product tree and fast interpolation [55, Alg. 10.11], we also obtain 
a cost of 0(l^(logp)^ loglogp) for Step 7. Here we use the 0{M{l\ogp)) 
bit-complexity of polynomial multiplication in Fp[X] to bound the cost at 
each level, rather than using the bound in [55, Cor. 10.12]. 

The bound 0(Z^(logp)^ loglogp) applies to every step, completing the proof. □ 
We are now ready to prove our main theorem. 

Theorem 1. Let T he the suitable family of orders in Example 4-3. Let I be an odd 

prime and let m € Z>o. Given inputs I, m, and O = T{1), Algorithm 6.1 correctly 
computes G (Z/mZ)[X, F]. Under the GRH, its expected running time is 

O(/Mog^noglog0, 
using Oiploglm) expected space. 

Proof. We first argue correctness. By Lemma 6.3, Algorithm 6.2 obtains a set of 
primes S that satisfy Theorem 4.1, with HpesJ^ > 4|c|, for every coefficient c of 
We now claim that for p G S, Algorithm 2.1 obtains, for each of Z + 2 distinct j- 
invariants ji, a list of / + 1 distinct j-invariants jik of /-isogcnous curves. Granting 
the claim, we may invoke standard properties of to show that Algorithm correctly 
interpolates e Fp[X,Y], see [56, Thm. 12.19] and [42, Thm. 5.3], for example. 
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The correctness of Algorithm 6.1 then fohows from the CRT and/or the exphcit 
CRT mod m, via [4, Thm. 3.1], as described in Section 6.2. 

As usual, let O = have fraction field K, and let i? = Z + lO. The claim 
above rests on three facts: (1) the explicit CM-action described in Section 4.3 
is correct, (2) any two Z^-isogenous elements of Elli^(Fp) must be Msogenous to 
exactly one and the same element of Ellc)(Fp), and (3) each P isogeny cycle in R 
contains exactly / — (^) elements. We note that (1) follows from the theory of 
complex multiplication and the properties of guaranteed by [56, Thm. 12.19], 
(2) follows from the ^-volcano structure, as shown by [28, §2.2] and [40, Prop. 23], 
and (3) is explicitly proven in Lemma 3.1. 

We now assume the GRH and bound the complexity of Algorithm 6.1. Lemma 6.3 

shows that the expected size of the largest p G is 0(log I), and we have #5 = Oil). 
Applying [53, §6] yields an 0(p log /to) space bound for to e Z>o. 

By [53, Thm. 1], the expected time to compute Ho in Step 1 is 0(Z^+^), and 
Lemma 6.3 gives an expected time of 0{l^^'^) for Step 2, for any e G R>o- Addi- 
tionally, we have \ogp > (6 + fc) logZ for all p G S with probability approaching 1 
exponentially as k increases. The time complexity of all remaining steps in Algo- 
rithm 6.1, including calls to Algorithm 2.1, depends polynomially on logp, hence 
we may bound the expected running time assuming logp = O(logZ). 

Regardless of the exact cutoff used, if M(logTO) = 0((log/)^loglog/) whenever 
we consider m "small", we may apply the results of [53, §6] to obtain a bound 
of 0(/"^ (log /)'^ log log/) on the expected time for all CRT computations, for every 
TO e Z>o- Since J" satisfies the conditions of Theorem 5.1, we may apply Lemma 6.5 
with logp = 0(log/) to obtain an 0(/^ (log Z)"^ log log/) bound on the expected time 
of each call to Algorithm 2.1. Applying #5* — 0(1) completes the proof. □ 

6.6. Remark on M(n). To bound the complexity of multiplication in Fp, and of 
polynomials of low degree, we assume only that M(n) = 0(n'^) for some c < 2, 

already achieved by Karatsuba's algorithm [55, Alg. 8.1]. In the practical range 
of /, the elements of Fp actually fit in a single 64-bit machine word and can be 
multiplied very quickly. We do assume FFT-based multiplication is used for mul- 
tiplying polynomials of degree / in Fp[X], and this assumption is met in practice; 
most of the computations described in Section 8 make heavy use of the FFT. 

6.7. Selecting a suitable order. The family of orders used in Theorem 1 suffices 
to prove the complexity bound, but we can simplify the implementation and improve 

performance with some additional constraints on the order O. Let us fix a bound 
b < I (say b = 256, for large /), and a small prime /q < / (typically Iq = 2). As 
above, R is the order of index / in O, and Ok is the maximal order. We seek an 
order O for which the following hold: 

(1) The conductor of O is 6-smooth, h{OK) < b, and h{0) >l + 2. 

(2) The groups cl(C) and cl(i?) are either generated by a single ideal with norm 
/o, or by two ideals with norms /q and /i, where h < b is ramified. 

The first condition ensures that O is suitable for / and allows us to to obtain a 
root of Ho{X) using only polynomials of degree at most b. This is accomplished 
by finding a root of Hok{X) and descending to the proper level of the /'-isogeny 
volcano for each prime V < h dividing the conductor of O, as in [53, §4.1]. The 
second condition allows us to realize the torsors for cl(C') and cl(i?) either by 
walking a single /o-isogeny cycle, or by walking two Zo-isogeny cycles connected by 
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a single /i-isogcny. In the latter case we orient the two cycles by computing one 
extra /i-isogeny. In both cases we avoid the need to ever distinguish the action of 
an ideal and its inverse, simplifying the computation described in Section 5.3. 

Subject to these conditions, we also wish to minimize h{0) > I + 2. To find 
such orders we enumerate fundamental discriminants (Ik < —4 with (^) = 1 and 
h{dx) < b, and for each cLk we select 6-smooth integers u for which h{v?dK) is 
slightly greater than I + 2 and test whether condition (2) holds. In practice we are 
almost always able to obtain O with h{0) within a few percent of / + 2. 

7. Modular functions other than j 

Let 5 be a modular function of level N, and let Z | A'' be a prime. We define the 
modular polynomial $^ of level I for g as the minimal polynomial of the function 
g{lz) over the field C{g). Much of the theory for the classical modular polynomial 
of the j-function generalizes to g. In particular, if the Fourier expansion of g has 
integer coefficients then we have $f € Z(£()[X]. The following lemma gives us 
further information in this case. 

Lemma 7.1. Let g be a modular function and let I be a prime not dividing the level 
of g. Suppose that $f has integer coefficients. If g is invariant under the action of 
either S={\-^)& SL2(Z) or M = ( J e GL2(Q), then we have 

Proof. The proof follows the symmetry proof for F), see [42, Thm. 5.3]. □ 

To apply our method we additionally require that $f have degree Z + 1. This is 
not true in general, but it does hold in many cases of interest. 

The polynomial $f should not be confused with the minimal polynomial of g as 
an element of C(j), which we denote '^^{X,,J). The polynomial depends only 
on g, not I, and we assume it is known (for our purposes, it effectively defines g). 
Given , our goal is to efficiently compute $f for a prime I \ N. 

We wish to adapt Algorithm 2.1 to compute '^^ E Fp[X]. We may then apply 
Algorithm 6.1 to recover over the integers or modulo some integer m via the 
Chinese Remainder Theorem. To simplify matters, we place some additional re- 
strictions on the order O that we use in Algorithm 6.1. Specifically, we require that 
there is a generator r G H of O = Z[r] with the property that 

c/(r) e Ko, 

where Kq is the ring class field for the order O. We say that g is a, class invariant 
for O in this case. If we now take a prime p that splits completely in and E/Fp 
an elliptic curve with endomorphism ring O, then the polynomial 

^3{X,j{E))eFp[X] 

has at least one root in Fp. Indeed, the value h = g{T) mod p for a prime p\p of 
Ko satisfies '^3(^h,j{E)) = 0. 

We can analyze how many roots the polynomial '^^{X, j{E)) G Fp[X] has using 
a combination of Deuring lifting and Shimura reciprocity. We refer to [10, §6.7] for 
a detailed description of the techniques involved and only state the result here. Let 

: H ^ C be the roots of ^'^(X, j). The functions gi are modular of level A^, 
and they are permuted by the Galois group GL2(Z/7VZ) of the field of all modular 
functions of level A''. To state our result, we will associate a matrix A e GL2(Z/A''Z) 



MODULAR POLYNOMIALS VIA ISOGENY VOLCANOES 



21 



to the Frobenius morphism of E as follows. Fix an isomorphism End(ii^) — > O and 
let TTp € O be the image of the Frobenius morphism. If TTp has minimal polynomial 
— tX +p of discriminant A = — Ap, then we put 

(17) ^ = (T ^ GL2(Z/iVZ). 

Theorem 7.2. Let g he a modular function with the property that has integer 

coefficients and is separable modulo a prime p that does not divide the level of N . If 
E/Fp is an elliptic curve with endomorphism ring O and jo = then we have 

4f{x e Fp : jo) = 0} = 4{9i ■ -^'{guj) = and gf = 5J, 

where A is the matrix in (17). 

Proof See [10, §6.7]. □ 

To check if gf = gi holds is a standard computation, see [30] for example. 
Although the matrix A has norm p and trace t and therefore depends on p, we can 
often derive a result that merely depends on a congruence condition on p mod A''. 
Lemma 7.3 in Section 7.3 gives an example. 

7.1. Computing modular polynomials for 72. Let j2{z) denote the unique 
cube root of j{z) that has integral Fourier expansion. It was known to Weber 
already that 72 is a modular function of level 3, sec [57, §125], and 72 is a class 
invariant for O whenever 3 j disc(C'). We have 'i!^^{X,j) = X^ — j, and in this 
simple case there is no need to apply Theorem 7.2; if we restrict to p = 2 mod 3 
then every element of Fp has a unique cube root. Thus we can compute ^'j^'^ with 
only minor modifications to Algorithm 6.1: 

• Use a suitable order O with 3 | disc(C') and select only primes p=2 mod 3. 

• After Step 5 of Algorithm 2.1, replace each element of El\o{Fp) and Elli{(Fp) 

with its unique cube root in Fp. 

These changes suffice, but we can also improve the algorithm's performance. 
First, Lemmas 2-3 and Corollary 9 of [12] yield the bound 

(18) =2/logZ + 8/ 

on the logarithmic height of (conjecturally, B'j'^ = 2ZlogZ + 4Z for all I > 60, but 

wc do not use this). Thus we can reduce the height bound Bi in (14) by a factor of 
approximately 3 when computing This reduces the number of primes p G 5, 
and the corresponding number of calls Algorithm 6.1 makes to Algorithm 2.1. 

Second, we may take advantage of the fact that is sparser than As noted 
in [21, p. 37], the coefficient of X°-Y'^ in is zero unless 

(19) a + lb=l + l mod 3. 

The proof of this relation goes back to Weber: the argument given in [57, p. 266] 
generalizes immediately to 72. It allows us to reduce the number of points we use 
to interpolate by a factor of approximately 3. 

Let n= \{l + l)/3] + 1. When selecting a suitable order O as in Section 6.7, we 
now only require that h{0) > n, and further modify Algorithm 2.1 as follows: 

• In Steps 4-6 we construct just n polynomials $^^(X, of degree I + 1. 

• In Step 7 we interpolate I + 1 polynomials 4>^ of degree less than n by writing 

= Y''(l)l{Y^), with c e {0, 1, 2} satisfying c + lk = l + l mod 3. 
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This reduces the cost of aU the significant components of Algorithm 2.1 by a factor 
of approximately 3. The reduction in the cost of the interpolations in Step 7 is 
actually greater than this, since its complexity is superlinear in the degree. 

The total size of ^1'^ is approximately 9 times smaller than and with the 
optimizations above, the time to compute it is effectively reduced by the same 
factor. A small amount of additional time is required to compute the cube roots of 
the elements in Ellc)(Fp) and 'EAlniYp), but even this can be avoided. 

Provided we have already computed for some small values of V (specifically, 
for the primes /q and li of Section 6.7), we may use these polynomials to directly 
enumerate sets Ell^)^ (Fp) and Ell^^ (Fp) containing the cube roots of the elements in 
Ello(Fp) and Ellij(Fp) respectively. We need only compute the cube roots of jo and 
j'q as starting points. This third optimization yields a small but useful improvement 
in the case of 72, and plays a critical role in the examples that follow. 

7.2. Recovering $; from Having computed , we note that may be 
computed via [21, Eq. 23]: 

(20) $,(X3,y3) = ^]^{X,Y)^f{X,ujY)^]^{X,uj'^Y), 

where w = e^^*/"^. For computation in Z, or modulo m, it is more convenient to 
express ^'J'^ in terms of polynomials Po,-Pi)-P2 S Z[X, F] satisfying 

(21) ^f{X,Y) = PQ{X^,Y^)Y^ + Pi{X^,Y^)XY + P2{X^,Y^)X'^Y'^-^, 
where 6 = 2 when I = 1 mod 3, and 6 = when I = 2 mod 3. We then have 

(22) $i = P^Y'' + (Pf - 3PqPiP2)XY + P^X'^Y'^'K 

Using Kronecker substitution and fast multiplication, it is possible to evaluate (22) 
in time 0(Z'^(log Z)^+^), which is asymptotically faster than Algorithm 6.1. This 
suggests that we might more efficiently compute by recovering it from , but 
we do not find this to be true in practice: it actually takes longer to evaluate (22) 
than it docs to compute directly. This can be explained by two factors. First, 
the Fp-operations used in Algorithm 2.1 effectively have unit cost for word-size 
primes, making it faster than Theorem 1 would suggest for all but very large I. 
Secondly, the evaluation of (22) becomes extremely memory intensive when / is 
large. However, if we are computing mod m with logm <C / log^, then the time 
to apply (22) modulo m is negligible. In this situation it is quite advantageous to 
derive mod m from ^'J'^ mod m, as may be seen in Table 3 of Section 8. 

7.3. Computing modular polynomials for the Weber f function. We now 

consider the classical Weber function [57, p. 114] defined by 

— ^) — ' 

where ^48 = e** and 77(0) is the Dedekind eta function. This is a modular function 
of level 48 that satisfies 72 = (f^^ — 16)/ see [57, p. 179], thus we have 

¥{X,j) = {X^^-16f-X^^j. 

Asymptotically, we expect to be able to reduce the height bound Bi by a factor of 
degx '^^Sj = 72 when computing One can derive an explicit bound along 
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the lines of (18), but this tends to overestimate the 0{l) term quite significantly, 
so in practice for large I we use the heuristic bound 

(23) Bl=^nogl+^l (/> 2400), 

which has been verified for every prime I between 2400 and 10000. The modular 
polynomial $J is also sparse: the coefficient of X'^Y^ can be nonzero only when 

(24) la + b=l + l mod 24, 

as shown in [57, p. 266]. Thus <1>J is roughly 72 • 24 = 1728 times smaller than 
By applying the technique described above for 72 , mutatis m,utandi, we can actually 
compute more than 1728 times faster than for large values of /, as may be 
seen in see Table 2 of Section 8. When applying Algorithm 6.1, wc now insist that 
the O have discriminant D = 1 mod 8 and 3 \ D, since the Weber function yields 
class invariants in (at least) this case, sec [30], for example. 

Since — f will also yield class invariants for O, the polynomial ^^{X,jo) will 
always have at least two roots. The following lemma tells us that we can impose a 
congruence condition on p to ensure that we have exactly two roots. 

Lemma 7.3. Let p = 11 mod 12 be prime and let jo be the j -invariant of an 
elliptic curve E/Fp with End{E) isomorphic to an imaginary quadratic order O 
with discriminant D = 1 mod 8 and 3 t D. Then *f(X, jo) S Fp[X] has exactly 
two roots in Fp, and these are of the form Xq and —Xq. 

Note that if the lemma applies to O, it also applies to the order R of index I in O. 

Proof. We only have to apply theorem 7.2. The action of A on the roots of "^^{X, jg) 
is computed in [10, §6.7], and this yields the lemma. □ 

Given a j-invariant jo € Fp that corresponds to j(To) € Kq, we cannot readily 
determine which of the roots xq and —xq of ^''(X, jo) actually corresponds to f(To). 
The functions f and — f yield distinct class invariants, but they share the same 
modular polynomials, since <^1{X,Y) = $f(-X, ^Y) = $;"f(X,y), by (24). 

Thus for the initial jo obtained in Step 2 of Algorithm 2.1, it does not matter 
whether we pick xq or — xq as a root of ^''(X, jo), and we need not be concerned 
with making a consistent choice for each prime p. However it is critical that while 
computing $J mod p we make a consistent choice of sign for each j-invariant we 
convert to an "f- invariant" (a root of 4'f(A', jj) modp). This makes it impractical 
to enumerate j-invariants and convert them en masse. Instead, as described for 
72 above, we use modular polynomials for small I' to enumerate sets Ell^(Fp) 
and Ell^(Fp) from starting points xq and .Tq satisfying ^'^(xo, jo) = 5'^(xq, jp) = 0. 
This ensures that signs are chosen consistently within each of these sets, we only 
need to check that the sign choices for the two sets are consistent with each other. 

To do so, we use the fact that the coefficient of X'y' in ^1{X,Y) is -1. This 
is shown for $; in [57, §69], and the same argument applies to (E>J. Wc modify 
Algorithm 2.1 to compute the coefficient of X'-Y'- in $| modp in between Steps 5 
and 6. We do this twice, switching the signs in Ell£,(Fp) the second time, and expect 
exactly one of these computations to yield —1, thereby determining a consistent 
choice of signs. This test should be regarded as a heuristic, since we do not rule out 
the possibility that both choices produce —1. However, in the course of extensive 
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testing this has never happened, and we suspect that it cannot. If it does occur, 
the algorithm will detect this, and can then simply choose a different prime p. 

7.4. Eta quotients and Atkin modular functions. For a prime N, let 

'T]{Nzy " 



where s = 24/ gcd(12, TV — 1). These arc modular functions of level TV, and the 
polynomials "^n = that relate /at to j are sometimes called canonical modular 
polynomials [18, p. 418]. The functions /jv are closely related to the functions tt)^ 
considered in [24]. and in fact "iif" = so what follows applies to both. When 

N is 2, 3, 5, 7, or 13, we have deg^ \E'jv = 1 and can adapt Algorithm 6.1 to compute 
polynomials for odd primes I f N. We assume here that A'' is also odd. 

We have deg^ '^'n — N + 1, hence we can reduce the height bound Bi by a factor 
of approximately A'^ + 1. When selecting a suitable order O, we require that A'' is 
prime to the conductor and splits into prime ideals that are distinct in cl(0). This 
assumption is stronger than we need, but it simplifies the implementation. For the 
primes p € S we require that N is prime to v, where Ap = — disc(C'). 

As shown in [46], the polynomial ^t^iX^jo) modp has the same spHtting type 
as $Ar(X, jo) mod p. In particular, for jo € Ellci(Fp) (or jo € Elli{(Fp)) it has 
exactly two roots, say X\, and x^- These correspond to A'^-isogenies as follows: 
the j-invariants ji and of the two elliptic curves that are A''-isogenous to jo are 
uniquely determined by the relations ^'jv(A'^^/xi, ji) = and ^ nij^'^ I^^^h)- Here 
the transformation x /x realizes the Atkin-Lchncr involution on fN{z). 

Starting points xq and x'^ corresponding to jo and jp are chosen as follows. 
Let x'q be a root of '^MiX.j'o), chosen arbitrarily, and let j[ be determined by 
NiN'^ /x'Q,j'i) = 0. We then use Vein's formulas to obtain the j-invariant ji 
of the elliptic curve that is Z-isogenous to j[ (there is exactly one and it lies in 
Ello(Fp), since j[ € Ellij(Fp) is on the floor of its ^-volcano). Finally, xq is uniquely 
determined by the constraints ^'jv(a;o, jo) = and '^n{N'' /xo,ji) = 0. 

We next consider double eta-quotients [25, 26] of composite level A'' = piP2: 

where pi ^ P2 are primes and s = 24/ gcd(24, {pi — l)(p2 — !))• For (pi,_P2) in 
{(2, 3), (2, 5), (2, 7), (2, 13), (3, 5), (3, 7), (3, 13), (5, 7)}, 

the polynomial '^pi,p2 = \l/"pi'P2 has degree 2 in j and we can compute ^^"^'"'-^ for 
odd primes I \ N. Our restrictions on O are analogous to those for /jv or tt)^: we 
require that A'^ is prime to the conductor and that both pi and p2 split into distinct 
prime ideals in 01(0). Our requirements for p S iS are as above. We can reduce the 
height bound Bi by a factor of approximately (pi + l)(p2 + l)/2. 

With the double eta-quotients, the polynomial ^'p^.p^ (X, jo) has four roots, cor- 
responding to four distinct isogenics of (composite) degree A^. Each root Xi uniquely 
determines the j-invariant of a curve AT-isogenous to E/¥p as the unique root of 
^fpj^^p^ (.Tj, J)/( J — jo) G Fp[J]. The double eta-quotients are invariant under the 
Atkin-Lehner involution, so we need not transform Xj. With this understanding, 
the procedure for selecting xq and ocq is cis above. 
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In some cases one can obtain smaller modular polynomials by considering suit- 
able roots of the functions defined above. For example, a sixth root of fs also 
yields class invariants (this is shown for tnl in [24]), and the corresponding modular 
polynomials are sparser and of lower height (by a factor of 6). 

Our algorithm also applies to the Atkin modular functions, which we denote A^. 
These are (optimal) modular functions for Xq (iV) invariant under the Atkin-Lehner 
involution, see [21, 45] for further details. The polynomials ^'^'^ are known as 
Atkin modular polynomials, and are available in computer algebra systems such as 
Magma [14] and Sago [51]. For primes N < 32, and also N in the set {41, 47, 59, 71}, 
we have deg^- — 2 and can compute polynomials '^f '^' for odd primes I ^ N. 
This is done in essentially the same way as with the double eta-quoticnts, except 
that now N is prime and vJi-^n (X, jo) has just two roots, rather than four. For these 
Aff, the height bound can be reduced by a factor of approximately (N + l)/2. 

Finally, we note an alternative approach applicable to both eta-quotients and 
the Atkin modular functions. If wc choose O so that the prime factors of N are 
all ramified, then there is actually a unique xq G Fp corresponding to each jo 
in Ello(Fp) and Elli^(Fp), that is, \I'^(X,jo) has exactly one root in Fp. In this 
scenario we can simply enumerate j-invariants as usual and then replace each ji 
with a corresponding Xi. This is not as efficient and places stricter requirements 
on O, but it allows us to compute $f without needing to know for any I'. This 
provides a convenient way to "bootstrap" the process. In fact all of the modular 
polynomials we have considered can eventually be obtained via Algorithm 6.1, 
starting from the polynomials and $2- 

8. Computational results 

We have applied our algorithm to compute polynomials $f for all the modular 

functions discussed in Section 7 and every applicable I up to 1000. For the functions 
j, 72, and f we have gone further, and present details of these computations here. 

8.1. Implementation. The algorithms described in this paper were implemented 
using the GNU C/C++ compiler [50] and the CMP library [31] on a 64-bit Linux 
platform. Multiplication of large polynomials is handled by the zn_poly library 
developed by Harvey [33, 34]. 

The hardware platform included four 3.0 GHz AMD Phenom II processors, each 
with four cores and 8GB of memory. Up to 16 cores were used in the larger tests, 
with essentially linear speedup. For consistency we report total CPU times, noting 
that in a multi-threaded implementation, disk and network I/O can be overlapped 
with CPU activity so that all computations are CPU bound. 

As a practical optimization, we do not use the Hilbert class polynomial Hq in 
Step 1 of Algorithm 6.1. Instead, we compute the minimal polynomial of some 
more favorable class invariant, as described in [27], which is then used to obtain 
a j-invariant. Additionally, as noted in Section 6.7, it suffices to compute a class 
polynomial for the maximal order containing O. With these optimizations the time 
spent computing class polynomials is completely negligible (well under one second). 

Another important optimization is the use of polynomial gcds to accelerate root- 
finding when walking paths in the isogeny graph, a technique developed in [27, §2]. 
This greatly accelerates the enumeration of the sets Ello(Fp) and Ell7j(Fp) in Steps 
3 and 5 of Algorithm 6.1. As a result, most of the computation (typically over 75%) 
is spent interpolating polynomials in Steps 6 and 7. 
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I 


\D\ 


n 


Bi 


bi 


size (MB) 


time (s) 


MB/s 


101 


216407 


184 


6511 


5751 


2.65 


2.25 


1.18 


211 


393047 


369 


14949 


13359 


27.6 


14.4 


1.92 


307 


837407 


531 


22748 


20483 


90.5 


51.0 


1.78 


401 


626431 


725 


30640 


27642 


211 


130 


1.62 


503 


3076175 


870 


39421 


35686 


431 


264 


1.63 


601 


461351 


1011 


48027 


43542 


755 


485 


1.56 


701 


1254871 


1229 


56953 


51731 


1227 


863 


1.42 


809 


916599 


1376 


66731 


60743 


1926 


1410 


1.37 


907 


986855 


1517 


75712 


69017 


2759 


2010 


1.37 


1009 


2871983 


1728 


85157 


77653 


3857 


2910 


1.32 


2003 


91696103 


3410 


180941 


166095 


33120 


31800 


1.04 


3001 


248329639 


5122 


281635 


259272 


117256 


143000 


0.82 


4001 


72135279 


6939 


385300 


355707 


287783 


363000 


0.79 


5003 


67243191 


8373 


491355 


454429 


577740 


749000 


0.77 



Table 1. Computations of over Z. 



8.2. Computations over Z. Tables 1 and 2 provide performance data for com- 
putations of and using Algorithm 6.1. For each I we list: 

• The discriminant D of the suitable order O. 

• The number of CRT primes n = =ffS used. 

• The height bound Bi in bits and the actual bit-size bi of the largest coefficient. 

• The total size of (rcsp. $[) in megabytes (1MB = 10^ bytes), computed 
as the sum of the coefficient sizes, with symmetric terms counted only once. 

• The total CPU time, in seconds. This includes the time to select O. 

• The throughput, defined as the total size divided by the total CPU time. 

In the last column of Table 1 one can see the quasilinear performance of Algo- 
rithm 6.1 as a function of the size of and the constant factors appear to be 
advantageous relative to other algorithms. For example, computing $1009 with the 
evaluation/interpolation algorithm of [23] uses approximately 100000 CPU seconds 
(scaled to our hardware platform), while Algorithm 6.1 needs less than 3000. 

The first five rows of Table 2 may be compared to the corresponding rows of 
Table 1 to see the performance advantage gained when computing modular poly- 
nomials for the Weber f function rather than j. As expected, these polynomials are 
approximately 1728 times smaller, and the speedup achievcid by Algorithm 6.1 is 
even better; we already achieve a speedup of around 1800 when / = 1009, and this 
increases to to over 3000 when I = 5003. This can be explained by the superlinear 
complexity of interpolation, as well as the superior cache utilization achieved by 
condensing the sparse coefficients of as described in Section 7.1. 

As noted in Section 7.3, we used a heuristic height bound for the computations 
in Table 2. The gap between the values of bi and Bi in each case gives us high 
confidence in the results (the probability of this occurring by chance is negligible). 

8.3. Computations modulo m. Table 3 gives timings for computations of 
modulo 256-bit and 1024-bit primes m. The values of m are arbitrary, and, in par- 
ticular, they are not of a form suitable for direct computation with Algorithm 2.1. 
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I 


1^1 


n 


Bi 


bi 


size (MB) 


time (s) 


MB/s 


1009 


1391 


33 


1275 


1099 


2.34 


1.59 


1.47 


2003 


37231 


58 


2542 


2271 


19.5 


10.7 


1.81 


3001 


88879 


88 


3822 


3611 


69.6 


47.7 


1.46 


4001 


53191 


112 


5201 


4801 


167 


116 


1.45 


5003 


30959 


136 


6613 


6228 


339 


241 


1.41 


6007 


463039 


170 


8052 


7530 


595 


493 


1.21 


7001 


150631 


192 


9496 


8876 


957 


701 


1.37 


8009 


315031 


220 


10979 


10292 


1453 


1200 


1.21 


9001 


179159 


240 


12453 


11974 


2123 


1790 


1.18 


10009 


207919 


265 


13964 


13453 


2953 


2630 


1.12 


20011 


1114879 


537 


29485 


27860 


24942 


27600 


0.90 


30011 


2890639 


795 


45649 


43304 


87660 


123000 


0.71 


40009 


22309439 


1032 


62210 


59439 


214273 


335000 


0.64 


50021 


37016119 


1316 


79116 


78077 


508571 


677000 


0.75 


60013 


27334823 


1594 


96165 


91733 


747563 


1150000 


0.65 



Table 2. Computations of over Z. 



Instead, Algorithm 6.1 derives $j mod m from the computations of modp, for 
p G S, using the explicit CRT. The same set S is used as when computing over Z, 
so the running time is largely independent of m, but using the explicit CRT (mod 
m) yields a noticeable speedup when logm is significantly smaller than 6nogL For 
example, when I = 1009 it takes approximately 2300 seconds to compute mod m, 
for the m listed in Table 3, versus about 2900 seconds to compute over Z. 

In addition to computing mod m directly, we may also obtain mod m by 
first computing mod m and then applying (22), as discussed in Section 7.1. 
The time to compute mod m is essentially independent of to, but the time to 
apply (22) is not. Even so, for the 256-bit and 1024-bit m that we used, computing 

mod m in this fashion is much faster than computing mod m directly; for 
I = 1009 we achieve times of 223 and 403 seconds, respectively. As when computing 
this speedup improves super linearly, and for large I is noticeably greater than 
the expected factor of 9. 

When computing ^'^^ mod to we used the height bound Bj^ = 21 log / + 81 given 
by (18). The timings in Table 3 would be further improved if the heuristic bound 
B'l'^ = 21 log I + Al were used instead. While we conjecture that this bound holds 
for all / > 60, caution is warranted when applying heuristic bounds to computa- 
tions performed modulo to: the impact of an incorrect height bound may not be 
immediately apparent, as it is when computing over Z. 

The computations listed in Tables 1 and 2 were practically limited by space, not 
time. The largest computations took only a day or two when run on 16 cores, but 
required nearly a terabyte of disk storage. However when computing mod to, 
we can handle larger values of I without using an excessive amount of space. When 
/ = 20011, for example, the total size of is over 30 terabytes, but we are able to 
compute modulo a 256-bit integer to using less than 10 gigabytes. 
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m = 


2256 _ 


189 


m = 


21024 _ 


105 






$* 




*7 


$* 


101 


2.12 


0.16 


0.47 


2.16 


0.17 


1.53 


211 


12.4 


1.64 


3.26 


12.7 


1.68 


7.95 


307 


43.3 


4.82 


8.34 


44.0 


4.93 


19.3 


401 


109 


10.9 


17.9 


111 


11.1 


38.0 


503 


215 


23.3 


34.0 


219 


23.8 


66.4 


601 


390 


40.5 


55.8 


395 


41.4 


110 


701 


695 


69.1 


90.2 


703 


70.3 


158 


809 


1130 


105 


134 


1150 


107 


222 


907 


1590 


158 


194 


1600 


160 


306 


1009 


2300 


223 


267 


2320 


225 


403 


2003 


23900 


2400 


2590 


24100 


2440 


3210 


3001 


106000 


9250 


9650 


107000 


9360 


11200 


4001 


283000 


25100 


25900 


287000 


25400 


28600 


5003 


647000 


57000 


58300 


653000 


60200 


65700 


10009 


7180000 


681000 


687000 


7320000 


688000 


713000 



Table 3. Computations of and modulo m. 
Columns list the total time to obtain by computing and applying (22). 
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Appendix 

Lemma 8.1. Let e and c be any positive real constants. Let Tre{x) count the integers 
n e [3,x] with more than (loglogn)^+^ prime factors. Then ^^{x) is o{x/{logxy). 

Proof. Let Tk{x) count the integers in [1, x] that factor into exactly k primes, which 
need not be distinct. For x > e"^ let / denote the interval [(logloga;^/^)^"'"^,log2a;]. 
For all sufficiently large x we have 

(25) Mx) < x'^^ 

kei 

Clearly a;^/^ = o(a;/ log" x), so we only need to bound the sum. By [32, Thm. 437], 

^ fc.doglog.)'--^ 
k'. iogx 

For all sufficiently large x, one finds, via Stirling's approximation, that Tk{x) is a 
monotonically decreasing function of k on the interval I and therefore 

^Tk{x) < (log2a;)Tp(iogiog^i/2)i+e](a;). 

kei 

Taking logarithms and applying log(n!) = nlogn — n + O(logn), one obtains 

(26) log(^^^Tk{x)j < Iogx — £(logloga;^/^)^^^logloglogx + 0((loglogx)^^^). 

kei 
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For any fixed cq G R>o and all sufficiently large x the RHS of (26) is smaller than 
log(coa;/log'^a;). Thus X^fcg/T/j(a;) is o(a;/(loga;)'^), as desired. □ 
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